In this notebook, some template code has already been provided for you, and you will need to implement additional functionality to successfully complete this project. You will not need to modify the included code beyond what is requested. Sections that begin with '(IMPLEMENTATION)' in the header indicate that the following block of code will require additional functionality which you must provide. Instructions will be provided for each section, and the specifics of the implementation are marked in the code block with a 'TODO' statement. Please be sure to read the instructions carefully!
Note: Once you have completed all the code implementations, you need to finalize your work by exporting the Jupyter Notebook as an HTML document. Before exporting the notebook to HTML, all the code cells need to have been run so that reviewers can see the final implementation and output. You can then export the notebook by using the menu above and navigating to File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.
In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question X' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.
Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. Markdown cells can be edited by double-clicking the cell to enter edit mode.
The rubric contains optional "Stand Out Suggestions" for enhancing the project beyond the minimum requirements. If you decide to pursue the "Stand Out Suggestions", you should include the code in this Jupyter notebook.
Photo sharing and photo storage services like to have location data for each photo that is uploaded. With the location data, these services can build advanced features, such as automatic suggestion of relevant tags or automatic photo organization, which help provide a compelling user experience. Although a photo's location can often be obtained by looking at the photo's metadata, many photos uploaded to these services will not have location metadata available. This can happen when, for example, the camera capturing the picture does not have GPS or if a photo's metadata is scrubbed due to privacy concerns.
If no location metadata for an image is available, one way to infer the location is to detect and classify a discernible landmark in the image. Given the large number of landmarks across the world and the immense volume of images that are uploaded to photo sharing services, using human judgement to classify these landmarks would not be feasible.
In this notebook, you will take the first steps towards addressing this problem by building models to automatically predict the location of the image based on any landmarks depicted in the image. At the end of this project, your code will accept any user-supplied image as input and suggest the top k most relevant landmarks from 50 possible landmarks from across the world. The image below displays a potential sample output of your finished project.

We break the notebook into separate steps. Feel free to use the links below to navigate the notebook.
Note: if you are using the Udacity workspace, YOU CAN SKIP THIS STEP. The dataset can be found in the /data folder and all required Python modules have been installed in the workspace.
Download the landmark dataset.
Unzip the folder and place it in this project's home directory, at the location /landmark_images.
Install the following Python modules:
In this step, you will create a CNN that classifies landmarks. You must create your CNN from scratch (so, you can't use transfer learning yet!), and you must attain a test accuracy of at least 20%.
Although 20% may seem low at first glance, it seems more reasonable after realizing how difficult of a problem this is. Many times, an image that is taken at a landmark captures a fairly mundane image of an animal or plant, like in the following picture.

Just by looking at that image alone, would you have been able to guess that it was taken at the Haleakalā National Park in Hawaii?
An accuracy of 20% is significantly better than random guessing, which would provide an accuracy of just 2%. In Step 2 of this notebook, you will have the opportunity to greatly improve accuracy by using transfer learning to create a CNN.
Remember that practice is far ahead of theory in deep learning. Experiment with many different architectures, and trust your intuition. And, of course, have fun!
# Import all required packages, this way we make sure everything is installed
# standard packages
import os
import numpy as np
# plotting
import matplotlib.pyplot as plt
%matplotlib inline
# pytorch
import torch
torch.manual_seed(42)
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR
import torch.nn as nn
from torchvision import datasets, transforms, models
Use the code cell below to create three separate data loaders: one for training data, one for validation data, and one for test data. Randomly split the images located at landmark_images/train to create the train and validation data loaders, and use the images located at landmark_images/test to create the test data loader.
All three of your data loaders should be accessible via a dictionary named loaders_scratch. Your train data loader should be at loaders_scratch['train'], your validation data loader should be at loaders_scratch['valid'], and your test data loader should be at loaders_scratch['test'].
You may find this documentation on custom datasets to be a useful resource. If you are interested in augmenting your training and/or validation data, check out the wide variety of transforms!
# Load the images from a folder where training and test images are in separate sub-folders
images_root = "landmark_images"
train_folder = os.path.join(images_root, "train")
test_folder = os.path.join(images_root, "test")
# use mean and std from imagenet
image_mean=[0.485, 0.456, 0.406]
image_std=[0.229, 0.224, 0.225]
to_tensor_transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(image_mean, image_std),
])
# applied only to training set
train_transforms = transforms.Compose([
transforms.Resize(300),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(10),
# the next two transforms crop off the black bars from the random rotation
transforms.CenterCrop(256),
# pick a small part from the center for spatial variation
transforms.RandomResizedCrop(224, scale=(0.9, 1.1), ratio=(3/4, 4/3)),
to_tensor_transform,
]);
# applied to testing and validation sets
test_transforms = transforms.Compose([
transforms.Resize(256),
transforms.RandomHorizontalFlip(),
# takes crops of the 4 corners plus center
# we will later average the accuracy for all 5 images during testing to get a better sense of network quality
# transforms.FiveCrop(224),
# transforms.Lambda(lambda tensors: torch.stack([to_tensor_transform(t) for t in tensors]))
transforms.CenterCrop(224),
to_tensor_transform,
]);
# transforms will be applied to the training set after splitting off the validation set
dataset_train = datasets.ImageFolder(train_folder, transform=train_transforms)
dataset_valid = datasets.ImageFolder(train_folder, transform=test_transforms)
print(f"Images in training set: {len(dataset_train)}")
dataset_test = datasets.ImageFolder(test_folder, transform=test_transforms)
print(f"Images in test set: {len(dataset_test)}")
train_classes = dataset_train.classes
test_classes = dataset_test.classes
# reduce number of classes for testing so we don't run out of memory
num_train_classes = None
if num_train_classes is not None:
idx = [i for i in range(len(dataset_train)) if dataset_train.imgs[i][1] < num_train_classes]
# build the appropriate subset
train_classes = train_classes[:num_train_classes]
dataset_train = torch.utils.data.Subset(dataset_train, idx)
idx = [i for i in range(len(dataset_test)) if dataset_test.imgs[i][1] < num_train_classes]
# build the appropriate subset
test_classes = test_classes[:num_train_classes]
dataset_test = torch.utils.data.Subset(dataset_test, idx)
# split the training data into training and validation set
# keeping it small since this removes images from training
valid_size = 0.05
# set this to a lower value to use fewer images (to combat out of memory errors)
train_size = 1 - valid_size
# get all indices and split them according to valid_size
train_indices = [idx for idx in range(len(dataset_train))]
np.random.shuffle(train_indices)
train_split_index = int(train_size * len(train_indices))
valid_split_index = int((train_size + valid_size) * len(train_indices))
# create random samplers for the indices in each subset
train_set = torch.utils.data.Subset(dataset_train, train_indices[:train_split_index])
valid_set = torch.utils.data.Subset(dataset_valid, train_indices[train_split_index:valid_split_index])
print(f"Images in training sampler: {len(train_set)}")
print(f"Images in validation sampler: {len(valid_set)}")
print(f"Images in test sampler: {len(dataset_test)}")
loaders_scratch = {
'train': torch.utils.data.DataLoader(
train_set,
batch_size=32,
pin_memory=True,
num_workers=3,
),
'valid': torch.utils.data.DataLoader(
valid_set,
batch_size=32,
pin_memory=True,
num_workers=3,
),
'test': torch.utils.data.DataLoader(
dataset_test,
batch_size=32,
shuffle=True,
pin_memory=True,
num_workers=3,
),
}
## the class names can be accessed at the `classes` attribute
## of your dataset object (e.g., `train_dataset.classes`)
classes_scratch = {
'train': train_classes,
'valid': train_classes,
'test': test_classes,
}
Images in training set: 4996 Images in test set: 1250 Images in training sampler: 4746 Images in validation sampler: 250 Images in test sampler: 1250
Question 1: Describe your chosen procedure for preprocessing the data.
Answer:
Use the code cell below to retrieve a batch of images from your train data loader, display at least 5 images simultaneously, and label each displayed image with its class name (e.g., "Golden Gate Bridge").
Visualizing the output of your data loader is a great way to ensure that your data loading and preprocessing are working as expected.
def display_tensor_image(image, ax, norm=np.array([image_mean, image_std])):
img = np.array(image)
# for the test set we get a 5-tuple of images
if len(img.shape) == 4:
# take only the first image to visualize
img = img[0,:]
img = img.transpose(1, 2, 0)
if norm is not None:
img = (img * norm[1] + norm[0]).clip(0, 1)
ax.imshow(img)
# visualize 5 images from each dataset
fig, axes = plt.subplots(3, 5, figsize=(16, 10))
for row_num, row_axes in enumerate(axes):
# Get one of the 3 data loaders
name, loader = list(loaders_scratch.items())[row_num]
images, labels = next(iter(loader))
# get the classes for this loader from the prepared dict
classes = classes_scratch[name]
for col_num, ax in enumerate(row_axes):
ax.set_title(f"{classes[labels[col_num]]} ({labels[col_num]})", fontsize=10)
display_tensor_image(images[col_num], ax)
# useful variable that tells us whether we should use the GPU
use_cuda = torch.cuda.is_available()
use_cuda
True
Use the next code cell to specify a loss function and optimizer. Save the chosen loss function as criterion_scratch, and fill in the function get_optimizer_scratch below.
# LogSoftmax is used in the network so we use NLLLoss here instead of CrossEntropyLoss
criterion_scratch = nn.NLLLoss()
def get_optimizer_scratch(model):
optimizer = optim.Adam(model.parameters(), lr=0.01)
# optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, nesterov=True)
# the scheduler allows us to reduce the training rate after a number of epochs
scheduler = StepLR(optimizer, step_size=20, gamma=0.5)
return optimizer, scheduler
Create a CNN to classify images of landmarks. Use the template in the code cell below.
import torch.nn as nn
# define the CNN architecture
class Net(nn.Module):
def __init__(self, output_classes=len(train_classes)):
super(Net, self).__init__()
self.output_classes = output_classes
# some baseline constants for easier tweaking
# start number of filters, will be doubled after each maxpool
num_filters = 32
# helper to get a repeated block of 2 convolution layers and a maxpool
get_conv_block = lambda fromM, toM, useNorm=False: filter(
lambda op: op is not None, [
# bias has no effect if batch norm is applied directly afterwards
nn.Conv2d(int(num_filters * fromM), num_filters * toM, 3, padding=1, bias=(not useNorm)),
nn.BatchNorm2d(num_features=num_filters * toM) if useNorm else None,
nn.ReLU(),
nn.MaxPool2d(2, 2),
])
# using vgg as inspiration:
self.features = nn.Sequential(
# input 224x224
nn.Conv2d(3, num_filters, 3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2),
# input 112x112
*get_conv_block(1, 2),
# input 56x56
*get_conv_block(2, 4),
# input 28x28
*get_conv_block(4, 8),
# input 14x14
*get_conv_block(8, 8, True),
)
self.classifier = nn.Sequential(
# input 7x7
nn.Flatten(),
nn.Linear(7 * 7 * num_filters * 8, 4096),
nn.BatchNorm1d(num_features=4096),
nn.ReLU(),
nn.Dropout(p=0.5),
# memory constraints on the GPU don't allow me to add another linear layer
nn.Linear(4096, output_classes),
nn.LogSoftmax(dim=1),
)
def forward(self, x):
x = self.features(x)
return self.classifier(x)
def save(self, path):
checkpoint = {
"output_classes": self.output_classes,
"state_dict": self.state_dict(),
}
torch.save(checkpoint, path)
def load(path):
checkpoint = torch.load(path)
model = Net(output_classes=checkpoint["output_classes"])
model.load_state_dict(checkpoint["state_dict"])
return model
#-#-# Do NOT modify the code below this line. #-#-#
# instantiate the CNN
model_scratch = Net()
# move tensors to GPU if CUDA is available
if use_cuda:
model_scratch.cuda()
Question 2: Outline the steps you took to get to your final CNN architecture and your reasoning at each step.
Answer: I used the VGG network as inspiration although I had to implement the smallest possible, VGG11. The GPU I use for training only has 4GB of ram which was limiting the network size and number of layers.
The input image size of 224x224 needs to be reduced while increasing the number of filters. Through halving we can go down to 7x7 with 5 Maxpool layers so I have chosen this number.
Within each block I am sticking to a repeatable architecture of a Conv2d layer with 3x3 kernel and 1px padding. Only the last Conv2d layer has a batch norm applied before the ReLU activation to improve training stability.
The number of filters doubles after each block with 256 filters after the last layer.
The final classification is done by a 2 linear layers, learning the mapping between the 7x7x256 convolution output and the 50 target classes. Batch normalization after the first layer is used to try and improve the stability and learning rate of the classification network.
A softmax layer as the last step maps the detected probabilities to the one-hot encoded output classes.
Implement your training algorithm in the code cell below. Save the final model parameters at the filepath stored in the variable save_path.
def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
"""returns trained model"""
# initialize tracker for minimum validation loss
valid_loss_min = np.Inf
accuracy_max = 0
optimizer, scheduler = optimizer
for epoch in range(1, n_epochs + 1):
# initialize variables to monitor training and validation loss
train_loss = 0.0
valid_loss = 0.0
###################
# train the model #
###################
# set the module to training mode
model.train()
for batch_idx, (data, target) in enumerate(loaders['train']):
# move to GPU
if use_cuda:
data, target = data.to('cuda', non_blocking=True), target.to('cuda', non_blocking=True)
## find the loss and update the model parameters accordingly
## record the average training loss, using something like
## train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data.item() - train_loss))
optimizer.zero_grad()
output = model.forward(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
train_loss += ((1 / (batch_idx + 1)) * (loss.data.item() - train_loss))
train_loss /= len(loaders['train'])
######################
# validate the model #
######################
valid_total = 0
valid_correct = 0
with torch.no_grad():
# set the model to evaluation mode
model.eval()
for batch_idx, (data, target) in enumerate(loaders['valid']):
# move to GPU
if use_cuda:
data, target = data.to('cuda', non_blocking=True), target.to('cuda', non_blocking=True)
output = model.forward(data)
loss = criterion(output, target)
valid_loss += ((1 / (batch_idx + 1)) * (loss.data.item() - valid_loss))
# calculate accuracy
# convert output probabilities to predicted class
pred = torch.exp(output).data.max(1, keepdim=True)[1]
# compare predictions to true label
valid_total += data.size(0)
valid_correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
valid_loss /= len(loaders['valid'])
accuracy = valid_correct / valid_total
lr = None
if scheduler is not None:
scheduler.step()
lr = scheduler.get_last_lr()[0]
# print training/validation statistics
print(" ".join([
f"Epoch: {str(epoch).zfill(2)}",
f"LR: {round(lr, 4)}" if lr is not None else "",
f"Training Loss: {round(train_loss, 6)}",
f"Validation Loss: {round(valid_loss, 6)}",
f"Accuracy: {round(100 * accuracy, 1)}%",
]))
# if the validation loss has decreased, save the model at the filepath stored in save_path
# if valid_loss < valid_loss_min:
if accuracy > accuracy_max:
print("Accuracy increased, saving model")
# use the model's own save function if it exists
save = getattr(model, "save", None)
if callable(save):
save(save_path)
else:
torch.save(model.state_dict(), save_path)
valid_loss_min = valid_loss
accuracy_max = accuracy
return model
Use the code cell below to define a custom weight initialization, and then train with your weight initialization for a few epochs. Make sure that neither the training loss nor validation loss is nan.
Later on, you will be able to see how this compares to training with PyTorch's default weight initialization.
def custom_weight_init(m):
"""Initialize weights using a normal distribution"""
reset_parameters = getattr(m, 'reset_parameters', None)
classname = m.__class__.__name__
if classname.find('Linear') != -1:
# std dev is 1 / n
std = 1.0 / np.sqrt(m.in_features)
m.weight.data.normal_(0, std)
m.bias.data.fill_(0)
elif callable(reset_parameters):
m.reset_parameters()
#-#-# Do NOT modify the code below this line. #-#-#
model_scratch.apply(custom_weight_init)
model_scratch = train(
20,
loaders_scratch,
model_scratch,
get_optimizer_scratch(model_scratch),
criterion_scratch,
use_cuda,
'ignore.pt',
)
Epoch: 01 LR: 0.01 Training Loss: 0.061092 Validation Loss: 1.801314 Accuracy: 3.2% Accuracy increased, saving model Epoch: 02 LR: 0.01 Training Loss: 0.031041 Validation Loss: 0.476655 Accuracy: 9.2% Accuracy increased, saving model Epoch: 03 LR: 0.01 Training Loss: 0.024726 Validation Loss: 0.491876 Accuracy: 11.6% Accuracy increased, saving model Epoch: 04 LR: 0.01 Training Loss: 0.022297 Validation Loss: 0.508992 Accuracy: 14.8% Accuracy increased, saving model Epoch: 05 LR: 0.01 Training Loss: 0.020782 Validation Loss: 0.437183 Accuracy: 18.0% Accuracy increased, saving model Epoch: 06 LR: 0.01 Training Loss: 0.019301 Validation Loss: 0.455529 Accuracy: 20.4% Accuracy increased, saving model Epoch: 07 LR: 0.01 Training Loss: 0.017976 Validation Loss: 0.464502 Accuracy: 22.0% Accuracy increased, saving model Epoch: 08 LR: 0.01 Training Loss: 0.016948 Validation Loss: 0.513316 Accuracy: 18.8% Epoch: 09 LR: 0.01 Training Loss: 0.015639 Validation Loss: 0.415097 Accuracy: 25.2% Accuracy increased, saving model Epoch: 10 LR: 0.01 Training Loss: 0.015109 Validation Loss: 0.460877 Accuracy: 22.0% Epoch: 11 LR: 0.01 Training Loss: 0.014313 Validation Loss: 0.438757 Accuracy: 26.8% Accuracy increased, saving model Epoch: 12 LR: 0.01 Training Loss: 0.013178 Validation Loss: 0.464127 Accuracy: 22.4% Epoch: 13 LR: 0.01 Training Loss: 0.012188 Validation Loss: 0.41697 Accuracy: 30.8% Accuracy increased, saving model Epoch: 14 LR: 0.01 Training Loss: 0.011335 Validation Loss: 0.50267 Accuracy: 23.6% Epoch: 15 LR: 0.01 Training Loss: 0.010707 Validation Loss: 0.544256 Accuracy: 23.6% Epoch: 16 LR: 0.01 Training Loss: 0.009896 Validation Loss: 0.661789 Accuracy: 24.8% Epoch: 17 LR: 0.01 Training Loss: 0.009425 Validation Loss: 0.526922 Accuracy: 25.2% Epoch: 18 LR: 0.01 Training Loss: 0.008379 Validation Loss: 0.491528 Accuracy: 29.2% Epoch: 19 LR: 0.01 Training Loss: 0.008071 Validation Loss: 0.491826 Accuracy: 29.2% Epoch: 20 LR: 0.005 Training Loss: 0.00867 Validation Loss: 0.500014 Accuracy: 30.4%
Run the next code cell to train your model.
## you may change the number of epochs if you'd like,
## but changing it is not required
num_epochs = 100
#-#-# Do NOT modify the code below this line. #-#-#
# function to re-initialize a model with pytorch's default weight initialization
def default_weight_init(m):
reset_parameters = getattr(m, 'reset_parameters', None)
if callable(reset_parameters):
m.reset_parameters()
# reset the model parameters
model_scratch.apply(default_weight_init)
# train the model
model_scratch = train(
num_epochs,
loaders_scratch,
model_scratch,
get_optimizer_scratch(model_scratch),
criterion_scratch,
use_cuda,
'model_scratch.pt',
)
Epoch: 01 LR: 0.01 Training Loss: 0.060782 Validation Loss: 0.75409 Accuracy: 2.0% Accuracy increased, saving model Epoch: 02 LR: 0.01 Training Loss: 0.02976 Validation Loss: 0.474305 Accuracy: 10.8% Accuracy increased, saving model Epoch: 03 LR: 0.01 Training Loss: 0.02452 Validation Loss: 0.445893 Accuracy: 12.8% Accuracy increased, saving model Epoch: 04 LR: 0.01 Training Loss: 0.022645 Validation Loss: 0.429291 Accuracy: 15.2% Accuracy increased, saving model Epoch: 05 LR: 0.01 Training Loss: 0.021664 Validation Loss: 0.438103 Accuracy: 16.0% Accuracy increased, saving model Epoch: 06 LR: 0.01 Training Loss: 0.020641 Validation Loss: 0.511652 Accuracy: 12.0% Epoch: 07 LR: 0.01 Training Loss: 0.019484 Validation Loss: 0.525746 Accuracy: 14.4% Epoch: 08 LR: 0.01 Training Loss: 0.018544 Validation Loss: 0.557949 Accuracy: 18.8% Accuracy increased, saving model Epoch: 09 LR: 0.01 Training Loss: 0.017762 Validation Loss: 0.44124 Accuracy: 26.4% Accuracy increased, saving model Epoch: 10 LR: 0.01 Training Loss: 0.016658 Validation Loss: 0.520759 Accuracy: 16.8% Epoch: 11 LR: 0.01 Training Loss: 0.015429 Validation Loss: 0.532277 Accuracy: 15.6% Epoch: 12 LR: 0.01 Training Loss: 0.01443 Validation Loss: 0.548713 Accuracy: 14.8% Epoch: 13 LR: 0.01 Training Loss: 0.013415 Validation Loss: 0.436339 Accuracy: 24.0% Epoch: 14 LR: 0.01 Training Loss: 0.012257 Validation Loss: 0.45496 Accuracy: 24.8% Epoch: 15 LR: 0.01 Training Loss: 0.011624 Validation Loss: 0.553433 Accuracy: 21.2% Epoch: 16 LR: 0.01 Training Loss: 0.010747 Validation Loss: 0.525599 Accuracy: 24.0% Epoch: 17 LR: 0.01 Training Loss: 0.009834 Validation Loss: 0.591141 Accuracy: 19.6% Epoch: 18 LR: 0.01 Training Loss: 0.009125 Validation Loss: 0.559601 Accuracy: 26.4% Epoch: 19 LR: 0.01 Training Loss: 0.008699 Validation Loss: 0.618802 Accuracy: 18.4% Epoch: 20 LR: 0.005 Training Loss: 0.00848 Validation Loss: 0.59641 Accuracy: 21.2% Epoch: 21 LR: 0.005 Training Loss: 0.006616 Validation Loss: 0.461483 Accuracy: 32.0% Accuracy increased, saving model Epoch: 22 LR: 0.005 Training Loss: 0.006859 Validation Loss: 0.463866 Accuracy: 30.4% Epoch: 23 LR: 0.005 Training Loss: 0.005384 Validation Loss: 0.453245 Accuracy: 33.6% Accuracy increased, saving model Epoch: 24 LR: 0.005 Training Loss: 0.004005 Validation Loss: 0.461736 Accuracy: 36.0% Accuracy increased, saving model Epoch: 25 LR: 0.005 Training Loss: 0.00437 Validation Loss: 0.47761 Accuracy: 32.4% Epoch: 26 LR: 0.005 Training Loss: 0.004233 Validation Loss: 0.505484 Accuracy: 33.2% Epoch: 27 LR: 0.005 Training Loss: 0.003772 Validation Loss: 0.486554 Accuracy: 32.0% Epoch: 28 LR: 0.005 Training Loss: 0.003619 Validation Loss: 0.472542 Accuracy: 33.2% Epoch: 29 LR: 0.005 Training Loss: 0.003481 Validation Loss: 0.454623 Accuracy: 34.8% Epoch: 30 LR: 0.005 Training Loss: 0.003287 Validation Loss: 0.506417 Accuracy: 32.4% Epoch: 31 LR: 0.005 Training Loss: 0.003246 Validation Loss: 0.522302 Accuracy: 32.8% Epoch: 32 LR: 0.005 Training Loss: 0.003144 Validation Loss: 0.525323 Accuracy: 30.8% Epoch: 33 LR: 0.005 Training Loss: 0.002731 Validation Loss: 0.461028 Accuracy: 39.2% Accuracy increased, saving model Epoch: 34 LR: 0.005 Training Loss: 0.002279 Validation Loss: 0.507054 Accuracy: 36.8% Epoch: 35 LR: 0.005 Training Loss: 0.002372 Validation Loss: 0.533133 Accuracy: 37.2% Epoch: 36 LR: 0.005 Training Loss: 0.00225 Validation Loss: 0.472693 Accuracy: 35.2% Epoch: 37 LR: 0.005 Training Loss: 0.002153 Validation Loss: 0.494113 Accuracy: 37.6% Epoch: 38 LR: 0.005 Training Loss: 0.002036 Validation Loss: 0.507891 Accuracy: 38.8% Epoch: 39 LR: 0.005 Training Loss: 0.002049 Validation Loss: 0.524464 Accuracy: 39.2% Epoch: 40 LR: 0.0025 Training Loss: 0.001708 Validation Loss: 0.519238 Accuracy: 36.8% Epoch: 41 LR: 0.0025 Training Loss: 0.001421 Validation Loss: 0.4948 Accuracy: 40.0% Accuracy increased, saving model Epoch: 42 LR: 0.0025 Training Loss: 0.001075 Validation Loss: 0.513852 Accuracy: 40.0% Epoch: 43 LR: 0.0025 Training Loss: 0.000923 Validation Loss: 0.531167 Accuracy: 39.6% Epoch: 44 LR: 0.0025 Training Loss: 0.000945 Validation Loss: 0.541162 Accuracy: 40.4% Accuracy increased, saving model Epoch: 45 LR: 0.0025 Training Loss: 0.000849 Validation Loss: 0.517836 Accuracy: 43.6% Accuracy increased, saving model Epoch: 46 LR: 0.0025 Training Loss: 0.00074 Validation Loss: 0.537369 Accuracy: 39.2% Epoch: 47 LR: 0.0025 Training Loss: 0.000596 Validation Loss: 0.543542 Accuracy: 39.2% Epoch: 48 LR: 0.0025 Training Loss: 0.000635 Validation Loss: 0.560875 Accuracy: 41.2% Epoch: 49 LR: 0.0025 Training Loss: 0.000597 Validation Loss: 0.552855 Accuracy: 40.4% Epoch: 50 LR: 0.0025 Training Loss: 0.000626 Validation Loss: 0.51825 Accuracy: 41.6% Epoch: 51 LR: 0.0025 Training Loss: 0.000571 Validation Loss: 0.567665 Accuracy: 38.8% Epoch: 52 LR: 0.0025 Training Loss: 0.000488 Validation Loss: 0.53488 Accuracy: 39.2% Epoch: 53 LR: 0.0025 Training Loss: 0.000535 Validation Loss: 0.540461 Accuracy: 40.0% Epoch: 54 LR: 0.0025 Training Loss: 0.000509 Validation Loss: 0.569437 Accuracy: 40.4% Epoch: 55 LR: 0.0025 Training Loss: 0.000497 Validation Loss: 0.561086 Accuracy: 43.2% Epoch: 56 LR: 0.0025 Training Loss: 0.00046 Validation Loss: 0.553657 Accuracy: 40.4% Epoch: 57 LR: 0.0025 Training Loss: 0.000408 Validation Loss: 0.555358 Accuracy: 40.0% Epoch: 58 LR: 0.0025 Training Loss: 0.000427 Validation Loss: 0.573395 Accuracy: 39.6% Epoch: 59 LR: 0.0025 Training Loss: 0.000394 Validation Loss: 0.552861 Accuracy: 40.0% Epoch: 60 LR: 0.0013 Training Loss: 0.000481 Validation Loss: 0.571575 Accuracy: 39.6% Epoch: 61 LR: 0.0013 Training Loss: 0.000403 Validation Loss: 0.563044 Accuracy: 41.2% Epoch: 62 LR: 0.0013 Training Loss: 0.000405 Validation Loss: 0.563809 Accuracy: 40.8% Epoch: 63 LR: 0.0013 Training Loss: 0.000324 Validation Loss: 0.57939 Accuracy: 38.4% Epoch: 64 LR: 0.0013 Training Loss: 0.000283 Validation Loss: 0.562886 Accuracy: 41.2% Epoch: 65 LR: 0.0013 Training Loss: 0.000253 Validation Loss: 0.560779 Accuracy: 39.6% Epoch: 66 LR: 0.0013 Training Loss: 0.000241 Validation Loss: 0.547232 Accuracy: 41.2% Epoch: 67 LR: 0.0013 Training Loss: 0.000237 Validation Loss: 0.568306 Accuracy: 38.8% Epoch: 68 LR: 0.0013 Training Loss: 0.000226 Validation Loss: 0.565051 Accuracy: 38.8% Epoch: 69 LR: 0.0013 Training Loss: 0.000213 Validation Loss: 0.556234 Accuracy: 40.0% Epoch: 70 LR: 0.0013 Training Loss: 0.000223 Validation Loss: 0.551678 Accuracy: 40.8% Epoch: 71 LR: 0.0013 Training Loss: 0.000196 Validation Loss: 0.572504 Accuracy: 39.6% Epoch: 72 LR: 0.0013 Training Loss: 0.00021 Validation Loss: 0.576335 Accuracy: 41.6% Epoch: 73 LR: 0.0013 Training Loss: 0.000211 Validation Loss: 0.55445 Accuracy: 40.4% Epoch: 74 LR: 0.0013 Training Loss: 0.000202 Validation Loss: 0.566059 Accuracy: 41.2% Epoch: 75 LR: 0.0013 Training Loss: 0.000225 Validation Loss: 0.581659 Accuracy: 40.4% Epoch: 76 LR: 0.0013 Training Loss: 0.000211 Validation Loss: 0.565855 Accuracy: 42.0% Epoch: 77 LR: 0.0013 Training Loss: 0.000171 Validation Loss: 0.578729 Accuracy: 41.6% Epoch: 78 LR: 0.0013 Training Loss: 0.000177 Validation Loss: 0.557453 Accuracy: 40.0% Epoch: 79 LR: 0.0013 Training Loss: 0.000191 Validation Loss: 0.572661 Accuracy: 41.6% Epoch: 80 LR: 0.0006 Training Loss: 0.000169 Validation Loss: 0.559091 Accuracy: 40.4% Epoch: 81 LR: 0.0006 Training Loss: 0.000136 Validation Loss: 0.573927 Accuracy: 40.8% Epoch: 82 LR: 0.0006 Training Loss: 0.00016 Validation Loss: 0.551147 Accuracy: 41.2% Epoch: 83 LR: 0.0006 Training Loss: 0.00016 Validation Loss: 0.569413 Accuracy: 42.0% Epoch: 84 LR: 0.0006 Training Loss: 0.000123 Validation Loss: 0.587144 Accuracy: 39.6% Epoch: 85 LR: 0.0006 Training Loss: 0.000111 Validation Loss: 0.56281 Accuracy: 40.4% Epoch: 86 LR: 0.0006 Training Loss: 0.000151 Validation Loss: 0.562854 Accuracy: 40.0% Epoch: 87 LR: 0.0006 Training Loss: 0.000111 Validation Loss: 0.587041 Accuracy: 42.8% Epoch: 88 LR: 0.0006 Training Loss: 0.000122 Validation Loss: 0.594697 Accuracy: 43.2% Epoch: 89 LR: 0.0006 Training Loss: 0.000108 Validation Loss: 0.599435 Accuracy: 40.0% Epoch: 90 LR: 0.0006 Training Loss: 0.000115 Validation Loss: 0.596504 Accuracy: 41.6% Epoch: 91 LR: 0.0006 Training Loss: 0.000129 Validation Loss: 0.590371 Accuracy: 39.6% Epoch: 92 LR: 0.0006 Training Loss: 0.000104 Validation Loss: 0.572197 Accuracy: 41.6% Epoch: 93 LR: 0.0006 Training Loss: 0.000115 Validation Loss: 0.58335 Accuracy: 42.4% Epoch: 94 LR: 0.0006 Training Loss: 0.000109 Validation Loss: 0.583494 Accuracy: 42.8% Epoch: 95 LR: 0.0006 Training Loss: 0.0001 Validation Loss: 0.586494 Accuracy: 39.6% Epoch: 96 LR: 0.0006 Training Loss: 8.8e-05 Validation Loss: 0.584852 Accuracy: 41.6% Epoch: 97 LR: 0.0006 Training Loss: 8.8e-05 Validation Loss: 0.592673 Accuracy: 42.8% Epoch: 98 LR: 0.0006 Training Loss: 8.3e-05 Validation Loss: 0.572898 Accuracy: 40.4% Epoch: 99 LR: 0.0006 Training Loss: 9.2e-05 Validation Loss: 0.581918 Accuracy: 38.8% Epoch: 100 LR: 0.0003 Training Loss: 8.5e-05 Validation Loss: 0.583887 Accuracy: 41.6%
Run the code cell below to try out your model on the test dataset of landmark images. Run the code cell below to calculate and print the test loss and accuracy. Ensure that your test accuracy is greater than 20%.
def test(loaders, model, criterion, use_cuda):
# monitor test loss and accuracy
test_loss = 0.
correct = 0.
total = 0.
with torch.no_grad():
# set the module to evaluation mode
model.eval()
for batch_idx, (data, target) in enumerate(loaders['test']):
# move to GPU
if use_cuda:
data, target = data.to('cuda', non_blocking=True), target.to('cuda', non_blocking=True)
# the test data creates multiple crops of each image
# we forward each through the model and average the loss
# for crops in data:
if len(data.shape) > 4:
data = data.reshape((-1,3,224,224))
target = target.repeat(5)
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
loss = criterion(output, target)
# update average test loss
test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data.item() - test_loss))
# convert output probabilities to predicted class
pred = torch.exp(output).data.max(1, keepdim=True)[1]
# compare predictions to true label
correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
total += data.size(0)
print("\n".join([
f"Test Loss: {round(test_loss, 6)}",
f"Test Accuracy: {int(100. * correct / total)}% ({int(correct)}/{int(total)})",
]))
# load the model that got the best validation accuracy
# model_scratch.load_state_dict(torch.load('model_scratch.pt'))
# model_scratch = Net.load('ignore.pt')
model_scratch = Net.load('model_scratch.pt')
if use_cuda:
model_scratch.cuda()
test(loaders_scratch, model_scratch, criterion_scratch, use_cuda)
Test Loss: 4.558631 Test Accuracy: 38% (484/1250)
You will now use transfer learning to create a CNN that can identify landmarks from images. Your CNN must attain at least 60% accuracy on the test set.
Use the code cell below to create three separate data loaders: one for training data, one for validation data, and one for test data. Randomly split the images located at landmark_images/train to create the train and validation data loaders, and use the images located at landmark_images/test to create the test data loader.
All three of your data loaders should be accessible via a dictionary named loaders_transfer. Your train data loader should be at loaders_transfer['train'], your validation data loader should be at loaders_transfer['valid'], and your test data loader should be at loaders_transfer['test'].
If you like, you are welcome to use the same data loaders from the previous step, when you created a CNN from scratch.
# useful variable that tells us whether we should use the GPU
use_cuda = torch.cuda.is_available()
# use_cuda = False
use_cuda
True
# Load the images from a folder where training and test images are in separate sub-folders
images_root = "landmark_images"
train_folder = os.path.join(images_root, "train")
test_folder = os.path.join(images_root, "test")
# use mean and std from imagenet
image_mean=[0.485, 0.456, 0.406]
image_std=[0.229, 0.224, 0.225]
to_tensor_transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(image_mean, image_std),
])
# applied only to training set
train_transforms = transforms.Compose([
transforms.Resize(300),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(10),
# the next two transforms crop off the black bars from the random rotation
transforms.CenterCrop(256),
# pick a small part from the center for spatial variation
transforms.RandomResizedCrop(224, scale=(0.9, 1.1), ratio=(3/4, 4/3)),
to_tensor_transform,
]);
# applied to testing and validation sets
test_transforms = transforms.Compose([
transforms.Resize(256),
transforms.RandomHorizontalFlip(),
# takes crops of the 4 corners plus center
# we will later average the accuracy for all 5 images during testing to get a better sense of network quality
# transforms.FiveCrop(224),
# transforms.Lambda(lambda tensors: torch.stack([to_tensor_transform(t) for t in tensors]))
transforms.CenterCrop(224),
to_tensor_transform,
]);
# transforms will be applied to the training set after splitting off the validation set
dataset_train = datasets.ImageFolder(train_folder, transform=train_transforms)
dataset_valid = datasets.ImageFolder(train_folder, transform=test_transforms)
print(f"Images in training set: {len(dataset_train)}")
dataset_test = datasets.ImageFolder(test_folder, transform=test_transforms)
print(f"Images in test set: {len(dataset_test)}")
train_classes = dataset_train.classes
test_classes = dataset_test.classes
# reduce number of classes for testing so we don't run out of memory
num_train_classes = None
if num_train_classes is not None:
idx = [i for i in range(len(dataset_train)) if dataset_train.imgs[i][1] < num_train_classes]
# build the appropriate subset
train_classes = train_classes[:num_train_classes]
dataset_train = torch.utils.data.Subset(dataset_train, idx)
idx = [i for i in range(len(dataset_test)) if dataset_test.imgs[i][1] < num_train_classes]
# build the appropriate subset
test_classes = test_classes[:num_train_classes]
dataset_test = torch.utils.data.Subset(dataset_test, idx)
# split the training data into training and validation set
# keeping it small since this removes images from training
valid_size = 0.05
# set this to a lower value to use fewer images (to combat out of memory errors)
train_size = 1 - valid_size
# get all indices and split them according to valid_size
# also shuffle randomly so the validation set is not just the last few classes
train_indices = [idx for idx in range(len(dataset_train))]
np.random.shuffle(train_indices)
train_split_index = int(train_size * len(train_indices))
valid_split_index = int((train_size + valid_size) * len(train_indices))
# create random samplers for the indices in each subset
train_set = torch.utils.data.Subset(dataset_train, train_indices[:train_split_index])
valid_set = torch.utils.data.Subset(dataset_valid, train_indices[train_split_index:valid_split_index])
print(f"Images in training sampler: {len(train_set)}")
print(f"Images in validation sampler: {len(valid_set)}")
print(f"Images in test sampler: {len(dataset_test)}")
loaders_transfer = {
'train': torch.utils.data.DataLoader(
train_set,
batch_size=24,
pin_memory=True,
num_workers=3,
),
'valid': torch.utils.data.DataLoader(
valid_set,
batch_size=24,
pin_memory=True,
num_workers=3,
),
'test': torch.utils.data.DataLoader(
dataset_test,
batch_size=24,
shuffle=True,
pin_memory=True,
num_workers=3,
),
}
## the class names can be accessed at the `classes` attribute
## of your dataset object (e.g., `train_dataset.classes`)
classes_transfer = {
'train': train_classes,
'valid': train_classes,
'test': test_classes,
}
Images in training set: 4996 Images in test set: 1250 Images in training sampler: 4746 Images in validation sampler: 250 Images in test sampler: 1250
Use the next code cell to specify a loss function and optimizer. Save the chosen loss function as criterion_transfer, and fill in the function get_optimizer_transfer below.
criterion_transfer = nn.CrossEntropyLoss()
def get_optimizer_transfer(model):
# optimizer = optim.Adam(model.classifier.parameters(), lr=0.01)
optimizer = optim.SGD(model.classifier.parameters(), lr=0.005)
scheduler = StepLR(optimizer, step_size=10, gamma=0.5)
return optimizer, scheduler
Use transfer learning to create a CNN to classify images of landmarks. Use the code cell below, and save your initialized model as the variable model_transfer.
# using vgg16 as feature detector
model_transfer = models.vgg16(pretrained=True)
# AlexNet should be smaller but is not quite as good
# model_transfer = models.alexnet(pretrained=True)
# print(model_transfer)
# layer 6 in the classifier is the last layer
# print(model_transfer.classifier[6].in_features)
# print(model_transfer.classifier[6].out_features)
# replace it with our own layer mapping to the target classes and set the classifier to be trained again
model_transfer.classifier[6] = nn.Linear(4096, len(train_classes))
# Freeze training for all "features" layers
for param in model_transfer.features.parameters():
param.requires_grad = False
# reset weights for the classifier layers. this is important or the network won't learn our landmarks!
for layer in model_transfer.classifier:
reset_parameters = getattr(layer, 'reset_parameters', None)
classname = layer.__class__.__name__
if classname.find('Linear') != -1:
# std dev is 1 / n
std = 1.0 / np.sqrt(layer.in_features)
layer.weight.data.normal_(0, std)
layer.bias.data.fill_(0)
elif callable(reset_parameters):
layer.reset_parameters()
# set all classifier layers to be trainable
for param in model_transfer.classifier.parameters():
param.requires_grad = True
print("Classifier:", model_transfer.classifier)
#-#-# Do NOT modify the code below this line. #-#-#
if use_cuda:
model_transfer = model_transfer.cuda()
Classifier: Sequential( (0): Linear(in_features=25088, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.5, inplace=False) (3): Linear(in_features=4096, out_features=4096, bias=True) (4): ReLU(inplace=True) (5): Dropout(p=0.5, inplace=False) (6): Linear(in_features=4096, out_features=50, bias=True) )
Question 3: Outline the steps you took to get to your final CNN architecture and your reasoning at each step. Describe why you think the architecture is suitable for the current problem.
Answer:
I chose VGG16 as a feature detector network since it is still reasonably sized and has great performance on image classification tasks.
Gradient calculation on the feature detector layer is then disabled since we want to use the pre-trained feature detector. I replaced the last linear layer in the classifier with one that outputs the correct number of training classes (50) and then reinitialized all the weights with a normal distribution. Without this reinitialization step the network will not learn and achieve very bad accuracy.
I then set the batch size to 24 because this still allows me to train the network on a 4GB GPU (it runs out of memory with a batch size of 32).
Train and validate your model in the code cell below. Save the final model parameters at filepath 'model_transfer.pt'.
# train the model
model_transfer = train(
50,
loaders_transfer,
model_transfer,
get_optimizer_transfer(model_transfer),
criterion_transfer,
use_cuda,
'model_transfer.pt',
)
Epoch: 01 LR: 0.005 Training Loss: 0.016494 Validation Loss: 0.17559 Accuracy: 55.6% Accuracy increased, saving model Epoch: 02 LR: 0.005 Training Loss: 0.010599 Validation Loss: 0.130661 Accuracy: 62.8% Accuracy increased, saving model Epoch: 03 LR: 0.005 Training Loss: 0.008086 Validation Loss: 0.113138 Accuracy: 66.0% Accuracy increased, saving model Epoch: 04 LR: 0.005 Training Loss: 0.006585 Validation Loss: 0.100412 Accuracy: 71.2% Accuracy increased, saving model Epoch: 05 LR: 0.005 Training Loss: 0.005483 Validation Loss: 0.098528 Accuracy: 70.4% Epoch: 06 LR: 0.005 Training Loss: 0.004781 Validation Loss: 0.094689 Accuracy: 71.6% Accuracy increased, saving model Epoch: 07 LR: 0.005 Training Loss: 0.004126 Validation Loss: 0.094902 Accuracy: 72.4% Accuracy increased, saving model Epoch: 08 LR: 0.005 Training Loss: 0.003539 Validation Loss: 0.093466 Accuracy: 72.8% Accuracy increased, saving model Epoch: 09 LR: 0.005 Training Loss: 0.003115 Validation Loss: 0.096293 Accuracy: 72.4% Epoch: 10 LR: 0.0025 Training Loss: 0.002879 Validation Loss: 0.087415 Accuracy: 73.6% Accuracy increased, saving model Epoch: 11 LR: 0.0025 Training Loss: 0.00247 Validation Loss: 0.090762 Accuracy: 74.4% Accuracy increased, saving model Epoch: 12 LR: 0.0025 Training Loss: 0.002175 Validation Loss: 0.091993 Accuracy: 75.2% Accuracy increased, saving model Epoch: 13 LR: 0.0025 Training Loss: 0.002071 Validation Loss: 0.090603 Accuracy: 73.2% Epoch: 14 LR: 0.0025 Training Loss: 0.002029 Validation Loss: 0.090621 Accuracy: 75.2% Epoch: 15 LR: 0.0025 Training Loss: 0.001866 Validation Loss: 0.089301 Accuracy: 75.2% Epoch: 16 LR: 0.0025 Training Loss: 0.001733 Validation Loss: 0.092701 Accuracy: 75.2% Epoch: 17 LR: 0.0025 Training Loss: 0.001595 Validation Loss: 0.090562 Accuracy: 76.4% Accuracy increased, saving model Epoch: 18 LR: 0.0025 Training Loss: 0.00153 Validation Loss: 0.092226 Accuracy: 76.4% Epoch: 19 LR: 0.0025 Training Loss: 0.001469 Validation Loss: 0.094242 Accuracy: 75.2% Epoch: 20 LR: 0.0013 Training Loss: 0.001316 Validation Loss: 0.095493 Accuracy: 73.6% Epoch: 21 LR: 0.0013 Training Loss: 0.00128 Validation Loss: 0.089717 Accuracy: 75.6% Epoch: 22 LR: 0.0013 Training Loss: 0.001211 Validation Loss: 0.095193 Accuracy: 73.6% Epoch: 23 LR: 0.0013 Training Loss: 0.001191 Validation Loss: 0.09465 Accuracy: 75.2% Epoch: 24 LR: 0.0013 Training Loss: 0.001198 Validation Loss: 0.094485 Accuracy: 76.4% Epoch: 25 LR: 0.0013 Training Loss: 0.001136 Validation Loss: 0.091568 Accuracy: 76.4% Epoch: 26 LR: 0.0013 Training Loss: 0.001108 Validation Loss: 0.094566 Accuracy: 74.0% Epoch: 27 LR: 0.0013 Training Loss: 0.001093 Validation Loss: 0.096453 Accuracy: 73.2% Epoch: 28 LR: 0.0013 Training Loss: 0.001058 Validation Loss: 0.09495 Accuracy: 75.6% Epoch: 29 LR: 0.0013 Training Loss: 0.00106 Validation Loss: 0.097971 Accuracy: 74.0% Epoch: 30 LR: 0.0006 Training Loss: 0.000988 Validation Loss: 0.094675 Accuracy: 76.8% Accuracy increased, saving model Epoch: 31 LR: 0.0006 Training Loss: 0.000952 Validation Loss: 0.09348 Accuracy: 76.4% Epoch: 32 LR: 0.0006 Training Loss: 0.000958 Validation Loss: 0.09825 Accuracy: 74.4% Epoch: 33 LR: 0.0006 Training Loss: 0.000947 Validation Loss: 0.095455 Accuracy: 75.2% Epoch: 34 LR: 0.0006 Training Loss: 0.000928 Validation Loss: 0.096032 Accuracy: 76.8% Epoch: 35 LR: 0.0006 Training Loss: 0.000911 Validation Loss: 0.094482 Accuracy: 76.4% Epoch: 36 LR: 0.0006 Training Loss: 0.000894 Validation Loss: 0.101178 Accuracy: 74.8% Epoch: 37 LR: 0.0006 Training Loss: 0.000907 Validation Loss: 0.093968 Accuracy: 74.8% Epoch: 38 LR: 0.0006 Training Loss: 0.000821 Validation Loss: 0.097174 Accuracy: 76.4% Epoch: 39 LR: 0.0006 Training Loss: 0.000877 Validation Loss: 0.096248 Accuracy: 75.2% Epoch: 40 LR: 0.0003 Training Loss: 0.000867 Validation Loss: 0.096354 Accuracy: 76.0% Epoch: 41 LR: 0.0003 Training Loss: 0.000879 Validation Loss: 0.098225 Accuracy: 73.6% Epoch: 42 LR: 0.0003 Training Loss: 0.000848 Validation Loss: 0.096268 Accuracy: 75.2% Epoch: 43 LR: 0.0003 Training Loss: 0.000841 Validation Loss: 0.09644 Accuracy: 77.2% Accuracy increased, saving model Epoch: 44 LR: 0.0003 Training Loss: 0.00082 Validation Loss: 0.09533 Accuracy: 75.2% Epoch: 45 LR: 0.0003 Training Loss: 0.000832 Validation Loss: 0.097919 Accuracy: 76.0% Epoch: 46 LR: 0.0003 Training Loss: 0.00082 Validation Loss: 0.098356 Accuracy: 74.8% Epoch: 47 LR: 0.0003 Training Loss: 0.000845 Validation Loss: 0.099007 Accuracy: 75.6% Epoch: 48 LR: 0.0003 Training Loss: 0.000849 Validation Loss: 0.098156 Accuracy: 75.2% Epoch: 49 LR: 0.0003 Training Loss: 0.000772 Validation Loss: 0.098841 Accuracy: 74.4% Epoch: 50 LR: 0.0002 Training Loss: 0.000755 Validation Loss: 0.096022 Accuracy: 75.2%
# torch.save(model_transfer.state_dict(), 'model_transfer.pt')
#-#-# Do NOT modify the code below this line. #-#-#
# load the model that got the best validation accuracy
model_transfer.load_state_dict(torch.load('model_transfer.pt'))
if use_cuda:
model_transfer.cuda()
Try out your model on the test dataset of landmark images. Use the code cell below to calculate and print the test loss and accuracy. Ensure that your test accuracy is greater than 60%.
test(loaders_transfer, model_transfer, criterion_transfer, use_cuda)
Test Loss: 0.96451 Test Accuracy: 78% (978/1250)
Great job creating your CNN models! Now that you have put in all the hard work of creating accurate classifiers, let's define some functions to make it easy for others to use your classifiers.
Implement the function predict_landmarks, which accepts a file path to an image and an integer k, and then predicts the top k most likely landmarks. You are required to use your transfer learned CNN from Step 2 to predict the landmarks.
An example of the expected behavior of predict_landmarks:
>>> predicted_landmarks = predict_landmarks('example_image.jpg', 3)
>>> print(predicted_landmarks)
['Golden Gate Bridge', 'Brooklyn Bridge', 'Sydney Harbour Bridge']
import cv2
from PIL import Image
## the class names can be accessed at the `classes` attribute
## of your dataset object (e.g., `train_dataset.classes`)
model = model_transfer
classes = classes_transfer["train"]
def predict_landmarks(img_path, k):
image = Image.open(img_path)
data = transforms.Compose([
transforms.Resize(224),
transforms.CenterCrop(224),
to_tensor_transform,
])(image);
data.unsqueeze_(0)
## return the names of the top k landmarks predicted by the transfer learned CNN
with torch.no_grad():
# set the module to evaluation mode
model.eval()
# move to GPU
if use_cuda:
data = data.to('cuda', non_blocking=True)
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# convert output probabilities to predicted class
pred = (-torch.exp(output).data)[0].argsort()[:k]
return [classes[idx][3:].replace("_", " ") for idx in pred]
# test on a sample image
predict_landmarks('images/test/09.Golden_Gate_Bridge/190f3bae17c32c37.jpg', 5)
['Golden Gate Bridge', 'Forth Bridge', 'Brooklyn Bridge', 'Sydney Harbour Bridge', 'Sydney Opera House']
In the code cell below, implement the function suggest_locations, which accepts a file path to an image as input, and then displays the image and the top 3 most likely landmarks as predicted by predict_landmarks.
Some sample output for suggest_locations is provided below, but feel free to design your own user experience!

def suggest_locations(img_path):
# get landmark predictions
predicted_landmarks = predict_landmarks(img_path, 3)
## display image and display landmark predictions
img = Image.open(img_path)
fig, ax = plt.subplots(figsize=(16, 10))
landmarks_str = ", ".join(predicted_landmarks[:2]) + " or " + predicted_landmarks[-1]
ax.set_title(f"This image seems to show either the {landmarks_str}", fontsize=16)
ax.imshow(img)
# test on a sample image
suggest_locations('images/test/09.Golden_Gate_Bridge/190f3bae17c32c37.jpg')
Test your algorithm by running the suggest_locations function on at least four images on your computer. Feel free to use any images you like.
Question 4: Is the output better than you expected :) ? Or worse :( ? Provide at least three possible points of improvement for your algorithm.
Answer: (Three possible points for improvement)
The output is pretty good although there are cases where the top suggestion is not the correct answer.
## Execute the `suggest_locations` function on
## at least 4 images on your computer.
## Feel free to use as many code cells as needed.
suggest_locations('images/01 wall of china.jpg')
suggest_locations('images/02 dwarf.jpg')
suggest_locations('images/03 eiffel tower.jpg')
suggest_locations('images/04 sydney.jpg')
suggest_locations('images/05 macchu picchu.jpg')